Vol 10 Issue 01,JAN 2020 ISSN NO: 0364-4308 # Clock gating and a carry select adder are the foundations of this low-power aluminized copper oxide (Alu) design. K Divya Lakshmi<sup>1,</sup> P Subbaraidu <sup>2</sup> G V Sanjeeva Reddy<sup>3</sup>, R Aruna <sup>4</sup> 1,2,3,4 Asst , Professor, Department of ECE, K. S. R. M College of Engineering(A), Kadapa # **ABSTRACT:** General-purpose personal computers (desktops and laptops) utilize much more power than their predecessors (on the order of a few watts) due to the CPUs' increasing complexity and speed. The ALU is essential to the operation of any CPU. It is used for mathematical and logical calculations. As the complexity of the processes inside the CPU grows, the ALU must adapt by becoming more intricate, expensive, space-consuming, and power-hungry. Therefore, the power consumption of the ALU is a major issue in CPU design. This research suggests that if you want to reduce your ALU's power consumption, you should use latch-free clock gating, which includes isolating a certain part of the design and turning off the rest. When a carry select adder (CSLA) is utilized as the central computational element of the arithmetic unit, more efficiency gains are expected. The design of the CSLA allows for future expansion in terms of both size and efficiency gains. This research shows that the footprint and power requirements of the CSLA may be significantly reduced by making a single, simple change at the gate level. Based on this modification, the 16-bit square root CSLA (SQRT CSLA) architecture was developed and evaluated. The proposed architecture takes a bit longer to respond than the conventional SQRT CSLA but consumes far fewer resources overall. A low-power 16-bit ALU for the Xilinx Spartan 3EFPGA is presented in this study, along with its design and HDL implementation. #### **KEYWORDS:** Carry Select Adder (CSLA), ALU, Low Power, Dynamic Power, Clock Power, Clock Gating, FPGA, High-Density Description Language (HDL) Programming, and FPGA. # INTRODUCTION As technology advances, the number of transistors needed by any particular digital logic design grows. The quantity of heat generated by a device increases in proportion to the number of transistors present, which has a major impact on its energy needs [1]. Reduced power consumption improves the battery life, performance, reliability, and cost of heat removal of today's portable devices, all of which are increasingly significant. This is why enhancing the device's speed while simultaneously decreasing its power consumption is always a top objective throughout the design phase. The ideal design would have high throughput with little power usage and minimal size. However, there are instances when area, speed, and power all need to be balanced against one another. Power optimization is possible across the whole digital design process, even though the benefits are greatest at the algorithmic and architectural design stages. Maximum speed, low power, and smallest size are the primary constraints of current microprocessor architecture in order to meet performance criteria. An arithmetic and logic unit (ALU) is a digital electrical circuit that can perform arithmetic and bitwise logical operations on integer binary data. The ALU, the heart and main building component of all computationally demanding units including the CPU, the FPU, and the GPU, is typically constantly in the data flow when an instruction decoded by the instruction decoder is being performed. To meet these requirements, an ALU with both high performance and low power consumption is necessary. As a result, the ALU's power consumption must be kept as low as possible. # DISSIPATION OF POWER AND ITS SOURCES CPU power consumption is affected by a number of factors, including dynamic power consumption, power consumption in a short circuit, and power loss due to transistor leakage currents. Pym + PS + Plea = PCPU In general, they are "below two powers," which include things like: For a low-power ALU architecture, an efficient adder in the propagation and generation block is needed. The elementary adder forms the total for the current bit position only after the sum of the previous bit position has Vol 10 Issue 01,JAN 2020 ISSN NO: 0364-4308 been produced and a carry has been passed into the next position. The Carry Select Adder (CSLA) is one of the fastest adders, and it is used in a number of data processing processors. Numerous computer systems use the CSLA to get around the problem of carry propagation delay by generating multiple carriers independently and then selecting one of them to make the total. Taking into consideration both carry input can = 0 and can = 1, the CSLA seen in Figure 4 uses numerous sets of Ripple Carry Adders (RCA) to generate partial sum and carry, with the final sum and carry being selected by multiplexers (mux). The regular layout of the CSLA suggests that it might be made more eco-friendly and more efficient. This research shows that the footprint and power requirements of the CSLA may be significantly reduced by making a single, simple change at the gate level. The proposed design only slightly increases latency compared to standard CSLAs while using less space and energy. Figure 5 displays a possible architecture for reducing the size and power consumption of a regular CSLA by swapping out the RCA with can = 1 for a Binary to Excess-1 Converter (BEC) [6, 7]. The main benefit of this BEC logic is that fewer logic gates are needed than in the RCA's n-bit Full Adder (FA) architecture. - There are many clocks gating styles available to optimize power in VLSI circuits. They can be: - Latch-free based Codesigns. - Latch-based Codesigns. # CARRY SELECT ADDER REQUIRES LESS POWER AND OCCUPIES LITTLE SPACE Designing a low-power ALU requires a highly efficient adder in the ALU's propagation and generation block. The elementary adder constructs the total for the current bit position only after the sum of the previous bit position has been formed and a carry has been communicated into the following position. The Carry Select Adder (CSLA) is one of the fastest adders, hence it is used in a wide range of data processing processors. The CSLA enables several computers to avoid the carry propagation delay problem by independently building multiple carriers and selecting one to create the total. Due to the usage of several sets of Ripple Carry Adders (RCA) to generate partial sum and carry while accounting for carry input can = 0 and can = 1, the CSLA seen in Figure 4 is not especially space efficient. (mux). Due to its modular nature, the CSLA is adaptable to a wide variety of energy and material efficiency upgrades. This research proposes a straightforward method of reducing the CSLA's harmful impacts on the environment and its dependency on power by only adjusting the gate height. The recommended design only slightly increases latency in comparison to standard CSLAs while necessitating less real estate and less energy. The RCA with can = 1 is commonly used in a CSLA, however as shown in Figure 5, a Binary to Excess-1 Converter may be used instead to reduce the system's size and power usage [6, 7]. (BEC). The primary benefit of this BEC logic is the decreased amount of logic gates needed when compared to the RCA's n-bit Full Adder (FA) architecture. Figure 1 Regular structure of carry select adder' Figure 2 Proposed structure of carry select undervalue Figure 3Conventional ALU. For the purpose of comparing the performance of the suggested ALU with that of the standard ALU seen in Figure 8, we provide the latter. There are five inputs to the ALU. You may use the letters A and B in addition to the sell, reset, and calk keys. The ALU calculation is completed and the output is returned to you in the out variable. The ALU takes in two 16-bit data values, denoted as A and B. If an opcode is retrieved by the instruction decoder, the input sell [3:0] will choose the appropriate ALU operation. The whole operation is predicated on the reset input parameter. # **RESULTS** Here, we provide the outcomes of our simulations and implementations, followed by the data we gathered from our experiments on both the established and suggested models. In Figure 10 and 11, we can see the timing waveform and RTL schematic of an ALU without clock gating. The timing waveform and RTL schematic of the adder used in the proposed model are shown in Figures 12 and 13, respectively. Figure 14 and 15 show the timing waveform and RTL schematic of ALU with clock gating, respectively. Figure 4 Simulated waveform of conventional model. ISSN NO: 0364-4308 $Figure\ 5\ RTL\ schematic\ of\ conventional\ model.$ Figure 6 Simulated waveform of CSLA. Figure 7 RTL schematic of CSLA. ISSN NO: 0364-4308 Figure 8 Simulated waveform of proposed model. Figure 9 RTL Schematic of proposed model Table 1Device utilization summery of conventional model | De | Device Utilization Summary (estimated values) | | | |---------------------------|-----------------------------------------------|-----------|-------------| | Logic Utilization | Used | Available | Utilization | | Number of Sices | 29 | 45 | 6% | | Number of Size Flip Flops | 150 | 972 | 1% | | Note of Argus U.S. | 58 | 932 | 5% | | Number of borded 108s | 9 | 202 | 2% | | Number of 923/6 | 1 | 34 | 46 | Table 2 Device utilization summery of proposed model | De De | vice Utilization Summary (es | timated values) | E | |---------------------------|------------------------------|-----------------|-------------| | Logic Utilization | lised | Available | Utilization | | Number of Sices | 387 | -65 | 85 | | Number of Size Figs Flags | 381 | 932 | 1% | | Number of Arput LUTs | 696 | 930 | 75 | | Number of bonded DDBs | .57 | 232 | 26 | | Number of SQUIs | 1 | 24 | 45 | Vol 10 Issue 01,JAN 2020 ISSN NO: 0364-4308 Table 1 and 2 shows the implementation results of both the models using SPARTAN 3E FPGA, realizes that clock gating technique adds the extra hardware to circuit and reduction in power is shown with table 4.it is formulated as Table 3 Power comparison of both models | Parameter | Conventional model (mW) | Proposed model (mW) | |-----------------|-------------------------|---------------------| | Total power | 209 | 198 | | Dynamic power | 130 | 119 | | Quiescent power | 79 | 79 | # **CONCLUSION** After designing the aforementioned models in VERILOG and simulating them using the XILINX ISE 12.1 design suite, we implemented them in a XILINX SPARTAN 3E FPGA and compared the results to the original models. Based on the findings, a 16-bit ALU that utilises clock gating uses less energy than a traditional ALU. Thus, it can be stated that the suggested model's usage of clock gating is an effective strategy for conserving clock and dynamic power by minimising switching activities on data and clock buses of idle modules while the active module is being processed. This demonstrates how power optimization, which was previously only concerned with the synthesis and placement and routing phases, has now been elevated to the system level and RTL stages. # **REFERENCES** - [1] Anju S. Pillai, Ishan T. B, "Factors Causing Power Consumption in An Embedded Processor", International Journal of Application and Innovation in Engineering and Management (IJ1IEM), 2(7), July2013. - [2] Kanika Kaur and Arti Noor, Strategies and Methodologies for Low Power VLSI Designs: A Review. International Journal of Advances in Engineering & Technology, 2011. - [3] Ankit Mitra, Design and Implementation of Low Power 16-bit ALU with clock gating, International Journal of Advanced Research in Computer Engineering & Technology (IJARCET), 2(6), June 2013. - [4] J. Shinde and S. S. Stalnaker, Clock Gating-A Power Optimizing Technique for VLSI circuits, in Proc. Annual IEEE India Conference (INDICON), pp. 1–4, 2011. - [5] V. Khorasan, B. V. Vahdat, and M. Mortazavi, Design and Implementation of floating point ALU on a FPGA processor, IEEE International Conference on Computing, Electronics and Electrical Technologies (ICCEET), pp. 772–776, 2012. - [6] B. Ramkumar, H.M. Kittul, and P. M. Kannan, ASIC Implementation of Modified Faster Carry Save Adder, Eur. J. Sci. Res., 42(1), pp. 53–58,2010. - [7] Y. Kim and L.-S. Kim, 64-Bit Carry-Select Adder with Reduced Area, Electron. Lett, 37(10), pp. 614-615, May2001. - [8] Bangaru Kalpana Amrut Anil Rao Purohit And R. Venkata Siva Reddy, Area Optimization of SPI Module Using Verilog Hdl, International Journal of Electronics and Communication Engineering & Technology, 7(3), 2016, pp. 38–45. - [9] Ms.Kshitija S. Patil ,Prof. G.D.Salunke and Mrs.Bhavana L. Mahajan, FPGA Implemented Multichannel HDLC Transceiver, International Journal of Electronics and Communication Engineering & Technology, 3(3), 2012, pp. 170–176. - [10] Devanshi S. Desai and Dr. Nagendra P. Gajjar, Low Bitrate Modulator Using FPGA, International Journal of Electronics and Communication Engineering & Technology, 5(4), 2014, pp.89-94.